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Abstract One of the most influential recent results in network analysis is that many 
natural networks exhibit a power-law or log-normal degree distribution. This has in- 
spired numerous generative models that match this property. However, more recent 
work has shown that while these generative models do have the right degree distri- 
bution, they are not good models for real life networks due to their differences on 
other important metrics like conductance. We believe this is, in part, because many 
of these real-world networks have very different Joint degree distributions, i.e. the 
probabihty that a randomly selected edge will be between nodes of degree k and I. 
Assortativity is a sufficient statistic of the joint degree distribution, and it has been 
previously noted that social networks tend to be assortative, while biological and 
technological networks tend to be disassortative. 

We suggest understanding the relationship between network structure and the 
joint degree distribution of graphs is an interesting avenue of further research. An 
important tool for such studies are algorithms that can generate random instances 
of graphs with the same joint degree distribution. This is the main topic of this pa- 
per and we study the problem from both a theoretical and practical perspective. We 
provide an algorithm for constructing simple graphs from a given joint degree distri- 
bution, and a Monte Carlo Markov Chain method for sampling them. We also show 
that the state space of simple graphs with a fixed degree distribution is connected via 
end point switches. We empirically evaluate the mixing time of this Markov Chain 
by using experiments based on the autocorrelation of each edge. These experiments 
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show that our Markov Chain mixes quickly on real graphs, allowing for utilization 
of our techniques in practice. 



1 Introduction 

Graphs are widely recognized as the standard modeling language for many complex 
systems, including physical infrastructure (e.g., Internet, electric power, water, and 
gas networks), scientific processes (e.g., chemical kinetics, protein interactions, and 
regulatory networks in biology starting at the gene levels through ecological sys- 
tems), and relational networks (e.g., citation networks, hyperlinks on the web, and 
social networks). The broader adoption of the graph models over the last decade, 
along with the growing importance of associated applications, calls for descriptive 
and generative models for real networks. What is common among these networks? 
How do they differ statistically? Can we quantify the differences among these net- 
works? Answering these questions requires understanding the topological proper- 
ties of these graphs, which have lead to numerous studies on many "real-world" 
networks from the Internet to social, biological and technological networks [18]. 

Perhaps the most prominent theme in these studies is the skewed degree dis- 
tribution; real-world graphs have a few vertices with very high degree and many 
vertices with small degree. There is some dispute as to the exact distribution, 
some have called it power-law [5, 18], some log-normal [4, 51, 41, 8], and but 
all agree that it is 'heavy-tailed' [17, 54]. The ubiquity of this distribution has 
been a motivator for many different generative models and is often used as a 
metric for the quality of the model. Models like preferential attachment [5], the 
copying model [31], the Barabasi hierarchical model [53], forest-fire model, the 
Kronecker graph model [33], geometric preferential attachment [19] and many 
more [34, 59, 11] study the expected degree distribution and use the results to argue 
for the strength of their method. Many of these models also match other observed 
features, such as small diameter or densification [28]. However, recent studies com- 
paring the generative models with real networks on metrics like conductance [35], 
core numbers [13] and clustering coefficients [30] show that the models do not 
match other important features of the networks. 

The degree distribution alone does not define a graph. McKay's estimate [39] 
shows that there may be exponentially many graphs with the same degree distribu- 
tion. However, models based on degree distribution are commonly used to compute 
statistically significant structures in a graph. For example, the modularity metric for 
community detection in graphs [43, 42] assumes a null hypothesis for the structure 
of a graph based on its degree distribution, namely that probability of an edge be- 
tween vertex v, and Vj is proportional to c/,c/,, where c/, and dj represent the degrees 
of vertices v, and vj. The modularity of a group of vertices is defined by how much 
their structure deviates from the null hypothesis, and a higher modularity signifies a 
better community. The key point here is that the null hypothesis is solely based on 
its degree distribution and therefore might be incorrect. Degree distribution based 
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models are also used to predict graph properties [40, 2, 15, 14, 16], benchmark [32], 
and analyze the expected run time of algorithms [7] . 

These studies improve our understanding of the relationship between the degree 
distribution and the structure of a graph. The shortcomings of these studies give in- 
sight into what other features besides the degree distribution would give us a better 
grasp of a graph's structure. For example, the degree assortativity of a network mea- 
sure whether nodes attach to other similar or dissimilar vertices. This is not spec- 
ified by the degree distribution, yet studies have shown that social networks tend 
to be assortative, while biological and technological networks tend to be dissorta- 
tive [47, 46]. An example of recent work using assortativity is [30]. In this study, 
a high assortativity is assumed for connections that generate high clustering coef- 
ficients, and this, in addition to preserving the degree distribution, results in very 
reaUstic instances of real-world graphs. Another study that has looked at the joint 
degree distribution is dK-gmphs [38]. They propose modeling a graph by looking 
at the distribution of the structure of all sized k subsets of vertices, where d = 1 are 
vertex degrees, J = 2 are edge degrees (the joint degree distribution), d = 3 is the 
degree distribution of triangles and wedges, and so on. It is an interesting idea, as 
clearly the riK distribution contains all information about the graph, but it is far too 
detailed as a model. At what d value does the additional information become less 
useful? 

One way to enhance the results based on degree distribution is to use a more re- 
strictive feature such as the joint degree distribution. Intuitively, if degree distribu- 
tion of a graph describes the probability that a vertex selected uniformly at random 
will be of degree k then its joint degree distribution describes the probability that 
a randomly selected edge will be between nodes of degree k and /. We will use a 
slightly different concept, the joint degree matrix, where the total number of nodes 
and edges is specified, and the numbers of edges between each set of degrees is 
counted. Note that while the joint degree distribution uniquely defines the degree 
distribution of a graph up to isolated nodes, graphs with the same degree distribu- 
tion may have very different joint degree distributions. We are not proposing that the 
joint degree distribution be used as a stand alone descriptive model for generating 
networks. We believe that understanding the relationship between the joint degree 
distribution and the network structure is important, and that having the capabihty 
to generate random instances of graphs with the same joint degree distribution will 
help enable this goal. Experiments on real data are valuable, but also drawing con- 
clusions only based on a limited data may be misleading, as the graphs may all be 
biased the same way. For a more rigorous study, we need a sampUng algorithm that 
can generate random instances in a reasonable time, which is the motivation of this 
work. 

The primary questions investigated by this paper are: Given a joint degree dis- 
tribution and an integer n, does the joint degree distribution correspond to a real 
labeled graph? If so, can one construct a graph of size n with that joint degree dis- 
tribution? Is it possible to construct or generate a uniformly random graph with that 
same joint degree distribution? We address these problems from both a theoretical 
and from an empirical perspective. In particular, being able to uniformly sample 
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graphs allows one to empirically evaluate which other graph features, like diameter, 
or eigenvalues, are correlated with the joint degree distribution. 

Contributions 

We make several contributions to this problem, both theoretically and experimen- 
tally. First, we discuss the necessary and sufficient conditions for a given joint degree 
vector to be graphical. We prove that these conditions are sufficient by providing a 
new constructive algorithm. Next, we introduce a new configuration model for the 
joint degree matrix problem which is a natural extension of the configuration model 
for the degree sequence problem. Finally, using this configuration model, we de- 
velop Markov Chains for sampUng both pseudographs and simple graphs with a 
fixed joint degree matrix. A pseudograph allows multiple edges between two nodes 
and self-loops. We prove the correctness of both chains and mixing time for the 
pseudograph chain by using previous work. The mixing time of the simple graph 
chain is experimentally evaluated using autocorrelation. 

In practice, Monte Carlo Markov Chains are a very popular method for sampling 
from difficult distributions. However, it is often very difficult to theoretically evalu- 
ate the mixing time of the chain, and many practitioners simply stop the chain after 
5,000, 10,000 or 20,000 iterations without much justification. Our experimental de- 
sign with autocorrelation provides a set of statistics that can be used as a justification 
for choosing a stopping point. Further, we show one way that the autocorrelation 
technique can be adapted from real-valued samples to combinatorial samples. 



2 Related Work 

The related work can be roughly divided into two categories: constructing and sam- 
phng graphs with a fixed degree distribution using sequential importance sampling 
or Monte Carlo Markov Chain methods, and experimental work on heuristics for 
generating random graphs with a fixed joint degree distribution. 

The methods for constructing graphs with a given degree distribution are primar- 
ily either reductions to perfect matchings or sequential sampling methods. There are 
two popular perfect matching methods. The first is the configuration model [10, IJ: ^ 
mini- vertices are created for each degree k vertex, and all the mini- vertices are con- 
nected. Any perfect matching in the configuration graph corresponds to a graph with 
the correct degree distribution by merging all of the identified mini- vertices. This al- 
lows multiple edges and self-loops, which are often undesirable. See Figure 1. The 
second approach, the gadget configuration model, prevents multi-edges and self- 
loops by creating a gadget for each vertex. If v,- has degree di, then it is replaced 
with a complete bipartite graph (f/;, V^) with = n — 1 — J,- and \Vi\ = n — 1. Ex- 
actly one node in each Vi is connected to each other Vj, representing edge {i,j) [26]. 
Any perfect matching in this model corresponds exactly to a simple graph by us- 
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ing the edges in the matching that correspond with edges connecting any Vt to any 
Vj. We use a natural extension of the first configuration model to the joint degree 
distribution problem. 




a 



Fig. 1 On the left, we see an example of the configuration model of the degree distribution of 
the graph on the right. The edges corresponding to that graph are bold. Each vertex is split into a 
number of mini-vertices equal to its degree, and then all mini-vertices are connected. Not all edges 
are shown for clarity. 

There are also sequential sampling methods that will construct a graph with a 
given degree distribution. Some of these are based on the necessary and sufficient 
Erdos-Gallai conditions for a degree sequence to be graphical [9], while others fol- 
low the method of Steger and Wormald [6, 57, 55, 24, 27]. These combine the con- 
struction and sampling parts of the problem and can be quite fast. The current best 
work can sample graphs where dmax = 0(m^/'*~^) in 0{mdmax) time [6]. 

Another approach for sampling graphs with a given degree distribution is to use a 
Monte Carlo Markov Chain method. There is significant work on sampling perfect 
matchings [25, 12]. There has also been work specifically targeted at the degree 
distribution problem. Kannan, Tetali and Vempala [26] analyze the mixing time of 
a Markov Chain that mixes on the configuration model, and another for the gadget 
configuration model. Gkantsidis, Mihail and Zegura [21] use a Markov Chain on the 
configuration model, but reject any transition that creates a self-loop, multiple edge 
or disconnects the graph. Both of these chains use the work of Taylor [58] to argue 
that the state space is connected. 

Amanatidis, Green and Mihail study the problems of when a given joint degree 
matrix has graphical representation and, further, when it has connected graphical 
representation [3]. They give necessary and sufficient conditions for both of these 
problems, and constructive algorithms. In Section 2, we give a simpler constructive 
algorithm for creating a graphical representation that is based on solving the degree 
sequence problem instead of alternating structures. 

Another vein of related work is that of Mahadevan et al. who introduce the con- 
cept of dK-senes |38, 37]. In this model, d refers to the dimension of the distribu- 
tion and 2K is the joint degree distribution. They propose a heuristic for generating 
random 2/r-graphs for a fixed 2K distribution via edge rewirings. However, their 
method can get stuck if there exists a degree in the graph for which there is only 1 
node with that degree. This is because the state space is not connected. We provide 
a theoretically sound method of doing this. 
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Finally, Newman also studies the problem of fixing an assortativity value, finding 
a joint remaining degree distribution with that value, and then sampling a random 
graph with that distribution using Markov Chains [47, 46]. His Markov Chain starts 
at any graph with the correct degree distribution and converges to a pseudograph 
with the correct joint remaining degree distribution. By contrast, our work provides 
a theoretically sound way of constructing a simple graph with a given joint degree 
distribution first, and our Markov Chain only has simple graphs with the same joint 
degree distribution as its state space. 



3 Notation and Definitions 

Formally, a degree distribution of a graph is the probability that a node chosen at 
random will be of degree k. Similarly, the joint degree distribution is the probabiUty 
that a randomly selected edge will have end points of degree k and /. In this paper, 
we are concerned with constructing graphs that exactly match these distributions, so 
rather than probabilities, we will use a counting definition below and call it the joint 
degree matrix. In particular, we will be concerned with generating simple graphs 
that do not contain multiple edges or self-loops. Any graph that may have multiple 
edges or self loops will be referred to as a pseudograph. 

Definition 1. The degree vector (DV) d{G) of a graph G is a vector where d(G)k is 
the number of nodes of degree k in G. 

A generic degree vector will be denoted by 2>. 

Definition 2. The joint degree matrix (JDM) / (G) of a graph G is a matrix where 
^ {G)k,i is exactly the number of edges between nodes of degree k and degree / in 
G. 

A generic joint degree matrix will be denoted by ^/ . Given a joint degree matrix, 
^ , we can recover the number of edges in the graph as m = Lr=i '^T=k r_/k,i- We 
can also recover the degree vector as = j(c/A:,)t + c^k,l)- The term is 
added twice because kQik is the number of end points of degree k and the edges in 
^k.k contribute two end points. 

The number of nodes, n is then YX=\ "^k- This count does not include any degree 
vertices, as these have no edges in the joint degree matrix. Given n and m, we can 
easily get the degree distribution and joint degree distribution. They are P{k) = \'2!k 
while P{k,l) = )^^k,i - Note that P(k) is not quite the marginal of P{k,l) although 
it is closely related. 

The Joint Degree Matrix Configuration Model 

We propose a new configuration model for the joint degree distribution problem. 
Given ^ and its corresponding ^ we create k labeled mini- vertices for every ver- 
tex of degree k. In addition, for every edge with end points of degree k and / we 
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create two labeled mini-end points, one of class k and one of class /. We connect 
all degree k mini -vertices to the class k mini-end points. This forms a complete 
bipartite graph for each degree, and each of these forms a connected component 
that is disconnected from all other components. We will call each of these compo- 
nents the "^-neighborhood". Notice that there are mini-vertices of degree k, and 
k^k = c/k,k+Y,i c^k.l corresponding mini-end points in each A:-neighborhood. This 
is pictured in Figure 2. Take any perfect matching in this graph. If we merge each 
pair of mini-end points that correspond to the same edge, we will have some pseu- 
dograph that has exactly the desired joint degree matrix. This observation forms the 
basis of our sampling method. 

degree k minivertices 

class k endpoints 
class I endpoints 

degree I minivertices 

Fig. 2 The joint degree matrix configuration model. This shows just two degree neighborhoods of 
the joint degree matrix configuration model. Each vertex of degree k is split into k mini-vertices 
which are represented by the circles. These then form a complete bipartite component when they 
are coimected with the class k end points, the squares. Each degree neighborhood is completely 
disconnected from all others. Not aU edges are included for clarity. 





4 Constructing Graphs with a Given Joint Degree Matrix 

The Erdos-Gallai condition is a necessary and sufficient condition for a degree se- 
quence to be reaUzable as a simple graph. 

Theorem 1. Erdos-Gallai A degree sequence d = {d\^d2, - ■ ■ dn} sorted in non- 
increasing order is graphical if and only if for every k <n, YJi=idi < k{k— 1) -|- 
V=k+i^^idi,k). 

The necessity of this condition comes from noting that in a set of vertices of 
size k, there can be at most (2) internal edges, and for each vertex v not in the 
subset, there can be at most min{d{v),k} edges entering. The condition consid- 
ers each subset of decreasing degree vertices and looks at the degree requirements 
of those nodes. If the requirement is more than the available edges, the sequence 
cannot be graphical. The sufficiency is shown via the constructive Havel-Hakimi 
algorithm [23, 22]. 

The existence of the Erdos-Gallai condition inspires us to ask whether similar 
necessary and sufficient conditions exist for a joint degree matrix to be graphical. 
The following necessary and sufficient conditions were independently studied by 
Amanatidis et al. [3]. 
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Theorem 2. Let ^ he given and Si he the associated degree distrihution. ^ can 
be realized as a simple graph if and only if {1 ) is integer-valued for all k and (2) 
\/k,l, ifk ^ I then < S>kSi- For each k, /k,k < Ci)- 

The necessity of these conditions is clear. The first condition requires that there 
are an integer number of nodes of each degree value. The next two are that the 
number of edges between nodes of degree k and / (or k and k) are not more than 
the total possible number of A: to / edges in a simple graph defined by the marginal 
degree sequences. Amanatidis et al. show the sufficiency through a constructive 
algorithm. We will now introduce a new algorithm that runs in 0{m) time. 

The algorithm proceeds by building a nearly regular graph for each class of 
edges, Assume that k^l for simplicity. Each of the nodes of degree k 
receives Y j^k.i I Sk\ edges, while f^i mod £^/, each have an extra edge. Similarly, 
the / degree nodes have [ /^.i edges, with ^^^i mod having 1 extra. We can 
then construct a simple bipartite graph with this degree sequence. This can be done 
in linear time in the number of edges using queues as is discussed after Lemma 1 . 
If ^ = /, the only differences are that the graph is no longer bipartite and there are 
2 J'k,k end points to be distributed among 2!^ nodes. To find a simple nearly regular 
graph, one can use the Havel-Hakimi [22, 23] algorithm in 0{^k,k) time by using 
the degree sequence of the graph as input to the algorithm. 

We must show that there is a way to combine all of these nearly-regular graphs to- 
gether without violating any degree constraints. Let d = {di,d2,---dn) be the sorted 
non-increasing order degree sequence from S'. Let dy denote the residual degree 
sequence where the residual degree of a vertex v is dy minus the number of edges 
that currently neighbor v. Also, let denote the number of nodes of degree k that 
have non-zero residual degree, i.e. = lLdj=k^{dj 0)- 



Algorithm 1 Greedy Graph Construction with a Fixed JDM, Input: ^,n,m, 2 

1: for = n - ■ ■ 1 and I = k - l do 

2: it I then 

3: Let a = ^k,l mod &k and b = ^k,i mod &i 

5: Construct a simple bipartite graph B with degree sequence Xi ■ ■ • • -y^^ 

6: else 

7: hetc = 2^kk mod^t 

8: Letxi---Xc= [^^J + 1 andxc+i • • -x^^ = L^^J 

9 : Construct a simple graph B with the degree sequence xi ■ ■ ■ 

10: end if 

1 1 : Place B into G by matching the nodes of degree k with higher residual degree with xi ■ --Xa 
and those of degree / with higher residual degree with yi ■■ -yi,. The other vertices in B can 
be matched in any way with those in G of degree k and / 

12: Update the residual degrees of each k and / degree node. 

13: end for 
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To combine the nearly uniform subgraphs, we start with the largest degree nodes, 
and the corresponding largest degree classes. It is not necessary to start with the 
largest, but it simplifies the proof. First, we note that after every iteration, the joint 

degree sequence is still feasible if\/k,l,ky^l < ^k^i and ^k,k < ( 2*)- 

We will prove that Algorithm 4 can always satisfy the feasibiUty conditions. First, 
we note a fact. 

Observation 1 For all k, Jk,i + Jk,k = Ldj=kdj 

This follows directly from the fact that the left hand side is summing over all of 
the k end points needed by ^ while the right hand side is summing up the available 
residual end points from the degree distribution. Next, we note that if all residual 
degrees for degree k nodes are either or 1, then: 

Observation 2 If, for all j such that dj = k, dj = 0or 1 then 
Ldj=kdj='Ldj=kHdjj^O) = ^k. 

Lemma 1. After every iteration, for every pair of vertices u, v of any degree k, 
\du-dy\ < 1. 

Amanatidis et al. refer to Lemma 1 as the balanced degree invariant. This is 
most easily proven by considering the vertices of degree k as a queue. If there are 
X edges to be assigned, we can consider the process of deciding how many edges 
to assign each vertex as being one of popping vertices from the top of the queue 
and reinserting them at the end x times. Each vertex is assigned edges equal to the 
number of times it was popped. The next time we assign edges with end points of 
degree k, we start with the queue at the same position as where we ended previously. 
It is clear that no vertex can be popped twice without all other vertices being popped 
at least once. 

Lemma 2. The above algorithm can always greedily produce a graph that satisfies 
^ , provided ^ satisfies the initial necessary conditions. 

Proof. There is one key observation about this algorithm - it maximizes ^k^l by 
ensuring that the residual degrees of any two vertices of the same degree never 
differ by more than I . By maximizing the number of available vertices, we can not 
get stuck adding a self-loop or multiple edge. From this, we gather that if, for some 
degree k, there exists a vertex j such that dj = 0, then for all vertices of degree k, 
their residuals must be either or 1 . This means that Y.dj=kdj = ^/t > ^k,i for every 
other / from Observation 2. 

From the initial conditions, we have that for every kj i < &kS!i. &k = 
provided that all degree k vertices have non-zero residuals. Otherwise, for any 
unprocessed pair, ^^,1 < ^^^i^k^^i} < ^k^i- For the k,k case, it is clear that 

^k.k < ^A: < (f*)- Therefore, the residual joint degree matrix and degree sequence 
will always be feasible, and the algorithm can always continue. □ 
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A natural question is that since the joint degree distribution contains all of the 
information in the degree distribution, do the joint degree distribution necessary 
conditions easily imply the Erdos-Gallai condition? This can easily be shown to be 
true. 

Corollary 1. The necessary conditions for a joint degree matrix to be graphical 
imply that the associated degree vector satisfies the Erdos-Gallai condition. 



5 Uniformly Sampling Graphs with Monte Carlo Markov Chain 
(MCMC) Methods 

We now turn our attention to uniformly sampling graphs with a given graphical joint 
degree matrix using MCMC methods. We return to the joint degree matrix configu- 
ration model. We can obtain a starting configuration for any graphical joint degree 
matrix by using Algorithm 1. This configuration consists of one complete bipartite 
component for each degree with a perfect matching selected. The transitions we use 
select any end point uniformly at random, then select any other end point in its de- 
gree neighborhood and swap the two edges that these neighbor. In Figure 2, this is 
equivalent to selecting one of the square endpoints uniformly at random and then se- 
lecting another uniformly at random from the same connected component and then 
swapping the edges. A more complex version of this chain checks that this swap 
does not create a multiple edge or self-loop. Formally, the transition function is a 
randomized algorithm given by Algorithm 2. 



Algorithm 2 Markov Chain Transition Function, Input: a configuration C 

1: With probability 0.5, stay at configuration C. Else: 

2: Select any endpoint ei uniformly at random. It neighbors a vertex vi in configuration C 
3: Select any e2 u.a.r from ei 's degree neighborhood. It neighbors V2 

4: (Optional: If the graph obtained from the configuration with edges £ U {(ei,V2), (e2, vi)} \ 

{(^iiVi), (e2, V2)} contains a multi-edge or self-loop, reject) 
5: £:^EU{(ei,V2),(e2,vi)}\{{ei,vi),(e2,V2)} 



There are two chains described by Algorithm 2. The first, ^ doesn't have step 
(4) and its state space is all pseudographs with the desired joint degree matrix. The 
second, ^ includes step (4) and only transitions to and from simple graphs with the 
correct joint degree matrix. 

We remind the reader of the standard result that any irreducible, aperiodic 
Markov Chain with symmetric transitions converges to the uniform distribution over 
its state space. Both ^ and ^ are aperiodic, due to the self-loop to each state. 
From the description of the transition function, we can see that .2/ is symmetric. 
This is less clear for the transition function of Is it possible for two connected 
configurations to have a different number of feasible transitions in a given degree 
neighborhood? We show that it is not the case in the following lennma. 
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Lemma 3. The transition function of is symmetric. 

Proof. Let C\ and C2 be two neighboring configurations in ^. This means that they 
differ by exactiy 4 edges in exactiy 1 degree neighborhood. Let this degree be k and 
let these edges be ei V] and ^2^2 in C\ whereas they are exvi and ^2^1 in C2. We want 
to show that C\ and C2 have exactly the same number of feasible A:-degree swaps. 

Without loss of generality, let ex.,ey be a swap that is prevented by e\ in C\ but 
allowed in C2. This must mean that ex neighbors vi and ey neighbors some Vy ^ 
vi,V2. Notice that the swap e\ex is currently feasible. However, in C2, it is now 
infeasible to swap e\,ex, even though Cx and ey are now possible. 

If we consider the other cases, like ex, ey is prevented by both e\ and ^2. then after 
swapping ei and 62, Sx, Cy is still infeasible. If swapping ei and 62 makes something 
feasible in Ci infeasible in C2, then we can use the above argument in reverse. This 
means that the number of feasible swaps in a A:-neighborhood is invariant under 
A;-degree swaps. □ 

The remaining important question is the connectivity of the state space over these 
chains. It is simple to show that the state space of sz/ is connected. We note that it is a 
standard result that all perfect matchings in a complete bipartite graph are connected 
via edge swaps [58]. Moreover, the space of pseudographs can be seen exactly as the 
set of all perfect matchings over the disconnected complete bipartite degree neigh- 
borhoods in the joint degree matrix configuration model. The connectivity result 
is much less obvious for We adapt a result of Taylor [58] that all graphs with 
a given degree sequence are connected via edge swaps in order to prove this. The 
proof is inductive and follows the structure of Taylor's proof. 

Theorems. Given two simple graphs, G\ and G2 of the same size with the same 
joint degree matrix, there exists a series of endpoint rewirings to transform Gi into 
G2 (and vice versa) where every intermediate graph is also simple. 

Proof. This proof will proceed by induction on the number of nodes in the graph. 
The base case is when there are 3 nodes. There are 3 realizable JDMs. Each is 
uniquely realizable, so there are no switchings available. 



Fig. 3 The three potential joint degree distributions when n = 3. 

Assume that this is true for n = k. Let Gi and G2 have k+1 vertices. Label the 
nodes of Gi and G2 vi • • • Vi+i such that deg{vi) > deg{v2) > ■ ■ ■ > deg{vk+i). Our 
goal will be to show that both graphs can be transformed in G[ and Gj respectively 
such that vi neighbors the same nodes in each graph, and the transitions are all 
through simple graphs. Now we can remove vi to create G" and Gj, each with n — 1 
nodes and identical JDMs. By the inductive hypothesis, these can be transformed 
into one other and the result follows. 




12 



Isabelle Stanton and Ali Pinar 



Uf 



Vc 

/ 

\ 

Ui+l 



Fig. 4 The dotted edges represent the trou- 
blesome edges that we may need to swap out 
before we can swap vi and v<;. 




^2 / \ Mdi-1 

Fig. 5 The disk is vi . The crosses are ei ■ ■ ■ ej^ . 



We will break the analysis into two cases. For both cases, we will have a set 
of target edges, ei,e2---erfi that we want vi to be connected to. Without loss of 
generality, we let this set be the edges that vi currently neighbors in G2. We assume 
that the edges are ordered in reverse lexicographic order by the degrees of their 
endpoints. This will guarantee that the resulting construction for vi is graphical and 
that we have a non-increasing ordering on the requisite endpoints. Now, let kj denote 
the endpoint in G2 for edge e, that isn't vi . 

Case 1) For the first case, we will assume that vi is already the endpoint of all 

edges ei , 62 • • • but that all of the ki may not be assigned correctly as in Figure 5. 
Assume that ei,e2---e,-i are all edges {vi,ki)---{vi,ki-i) and that e,- is the first 
that isn't matched to its appropriate kf. 

Call the current endpoint of the other endpoint of e, m,. We know that deg{ki) = 
deg(ui) and that ki currently neighbors deg{ki) other nodes, r[ki). We have two 
cases here. One is that v\ G r{ki) but via edge / instead of e,. Here, we can swap vi 
on the endpoints of / and e, so that the edge vi — e, — ki is in the graph. / can not 
be an ej where j < i because those edges have their correct endpoints, kj assigned. 
This is demonstrated in Figure 6. 

The other case is that vi <^r{ki). If this is the case, then there must exist some x € 
r{ki) \r{ui) because d{ui) = d{ki) and m, neighbors vi while ki doesn't. Therefore, 
we can swap the edges vi — e,- — m, and x — / — A:, to vi — e,- — ki and x — f—Ui without 
creating any self-loops or multiple edges. This is demonstrated in Figure 6. 




Fig. 6 The two parts of Case (1). Fig. 7 The two parts of Case (2) 



Therefore, we can swap all of the correct endpoints onto the correct edges. 

Case 2) For the second case, we assume that the edges ei , • • • e^i are distributed 
over ; nodes of degree di. We want to show that we can move all of the edges 
ei • • • e^/j so that vi is an endpoint. If this is achievable, we have exactly Case 1 . 
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Let ei , • • • e,_i be currently matched to v, and let e, be matched to some x such 
that deg{x) = d\ . Let / be an edge currently matched to v'l that is not part of ei • • • e^j 
and let its other endpoint be Uf. Let the other end point of e; be Ux as in Figure 7. 

We now have several initial cases that are all easy to handle. First, if v,x,Ux,Uf 
are all distinct and (v, u^) and {x,Uf) are not edges then we can easily swap v and x 
such that the edges go from v — / — m/ and x — Ci — Ux iov — ei — Ux andx — / — m/. 
Next, if Uf = Ux then we can simply swap vi onto e, and x onto / and, again, vi will 
neighbor e,. This will not create any self-loops or multiple edges because the graph 
itself will be isomorphic. This situations are both shown in Figure 7. 

The next case is that x = m/. If we try to swap vi onto e, then we create a self- 
loop from X to X via /. Instead, we note that since the JDM is graphical, there must 
exist a third vertex y of the same degree as vi and x that does not neighbor x. Now, y 
neighbors an edge g, and we can swap x — / and y — gXox — g and y — f. The edges 
are vi — f — y and x — e,- — m; and e,- can be swapped onto vj without conflict. 

The cases left to analyze are those where the nodes are all distinct and (vi , Ux) or 
{x,Uf) are edges in the graph. We will analyze these separately. 

Case 2a) If (vi , u^) is an edge in the graph, then it must be so through some edge 
named g. Note that this means we have vi—g — Ux and x — e,- — Mj^. We can swap this 
to vi — e, — Ux and x — g — Ux and have an isomorphic graph provided that g is not 
some ej where j < i. This is the top case in Figure 8. 

If g is some Cj then it must be that Ux = kj. This is distinct from ki. deg{kj) = 
deg{ki) so there must exist some edge h that ki neighbors with its other endpoint 
being y. There are again three cases, when y ^ x, vi y = x and when y = vi . These 
are the bottom three rows illustrated in Figure 8. The first is the simplest. Here, we 
can assume that kj does not neighbor y (because it neighbors vi and x that ki does 
not) so we can swap kj onto h and ki onto e\. This has removed the offending edge, 
and we can now swap vi onto ei and x onto /. 

When = X, we first swap ki onto Cj and kj onto h. Next, we swap v onto e, and 
X onto / as they no longer share an offending edge. 

Finally, when j = vi, we use a sequence of three swaps. The first is ki onto Cj 
and kj onto h. The next is vi onto ei and x onto h. Finally, we swap kj back onto ej 
and ki onto e,. 

Case 2b) If {x,Uf ) is an edge in the graph, then it must be through some edge g 
such that x — g — Uf and x — e,- — Without loss of generaUty, assume that / is the 
only edge neighboring V] that isn't an Cj. Since / doesn't neighbor V| in Gi, there 
must either exist a w with deg{w) = deg{uf) or with deg{vs) = d{vi). This relies 
critically upon the fact that / and g are the same class edge. If there is a w, then it 
doesn't neighbor vi (or we can apply the above argument to find a w') and it must 
have some neighbor y G r[w)\r{u) through edge h. Therefore, we can swap uj 
onto h and w onto /. This removes the offending edge, and we can now swap vi 
onto e, and x onto /. 

If Vg exists instead, then by the same argument, there exists some edge h with 
endpoint Us such that ^r{uf) and Ug ^ r(x). Therefore, we can swap Vg — h and 
x — gto Vs — g and x — h. This again removes the troublesome edge and allows us to 
swap vi onto e,-. 
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v\ X V ^1 X V Vi X V 




Fig. 9 A graphical representation of the situations discussed in Case (2b) 



Therefore, given any node, a precise set of edges that it should neighbor, and a 
set of vertices that are the endpoints of those edges, we can use half-edge-rewirings 
to transform any graph G to G' that has this property, provided the set of edges is 
graphical. □ 

Now that we have shown that both s/ and ^ converge to the uniform distribution 
over their respective state spaces, the next question is how quickly this happens. 
Note that from the proof that the state space of ^ is connected, we can upperbound 
the diameter of the state space by 3ot. The diameter provides a lower bound on the 
mixing time. In the next section, we will empirically estimate the mixing time to be 
also linear in m. 
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6 Estimating the Mixing Time of the Markov Chain 



The Markov chain is very similar to one analyzed by Kannan, Tetali and Vem- 
pala [26]. We can exactly use their canonical paths and analysis to show that the 
mixing time is polynomial. This result follows directly from Theorem 3 of [26] for 
chain . This is because the joint degree matrix configuration model can be viewed 
as 1^1 complete, bipartite, and disjoint components. These components should re- 
main disjoint, so the Markov Chain can be viewed as a 'meta-chain' which samples 
a component and then runs one step of the Kannan, Tetali and Vempala chain on 
that component. Even though the mixing time for this chain is provably polynomial, 
this upper bound is too large to be useful in practice. 

The analysis to bound the mixing time for chain is significantly more com- 
plicated. One approach is to use the canonical path method to bound the congestion 
of this chain. The standard trick is to define a path from G\ to G2 that fixes the 
misplaced edges identified by G\ © G2 in a globally ordered way. However, this is 
difficult to apply to chain because fixing a specific edge may not be atomic, i.e. 
from the proof of Theorem 3 it may take up to 4 swaps to correctly cormect a vertex 
with an endpoint if there are conflicts with the other degree neighborhoods. These 
swaps take place in other degree neighborhoods and are not local moves. There- 
fore, this introduces new errors that must be fixed, but can not be incorporated into 
Gi ® G2. In addition, step (4) also prevents us from using path coupUng as a proof 
of the mixing time. 

Given that bounding the mixing time of this chain seems to be difficult with- 
out new techniques or ideas, we use a series of experiments that substitute the 
autocorrelation time for the mixing time. 



6.1 Autocorrelation Time 

Autocorrelation time is a quantity that is related to the mixing time and is popular 
among physicists. We wiU give a brief introduction to this concept, and refer the 
reader to Sokal's lecture notes for further details and discussion [56]. 

The autocorrelation of a signal is the cross-correlation of the signal with itself 
given a lag f . More formally, given a series of data (X,) where each X, is a drawn 
from the same distribution X with mean \i and variance (7, the autocorrelation func- 

tionis/;x(0 = ^"''-"^ff-'"^^' - 

Intuitively, the inherent problem with using a Markov Chain sampling method is 
that successive states generated by the chain may be highly correlated. If we were 
able to draw independent samples from the stationary distribution, then the autocor- 
relation of that set of samples with itself would go to as the number of samples 
increased. The autocorrelation time is capturing the size of the gaps between sam- 
pled states of the chain needed before the autocorrelation of this 'thinned' chain 
is very small. If the thinned chain has autocorrelation, then it must be exactly 
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sampled from the stationary distribution. In practice, when estimating the autocor- 
relation from a finite number of samples, we do not expect it to go to exactly 0, but 
we do expect it to 'die away' as the number of samples and gap increases. 

Definition 3. The exponential autocorrelation time is Xexp^ = 1™ sup,_^„ -iog|W(OI ^''^^ ' 

Definition 4. The integrated autocorrelation time is Ti,ux = ^TT=-'>o^x{t) = j + 
U=iMt) [56]. 

The difference between the exponential autocorrelation time and the integrated 
autocorrelation time is that the exponential autocorrelation time measures the time 
it takes for the chain to reach equilibrium after a cold start, or 'bum-in' time. The 
integrated autocorrelation time is related to the increase in the variance over the 
samples from the Markov Chain as opposed to samples that are truly independent. 
Often, these measurements are the same, although this is not necessarily true. 

We can substitute the autocorrelation time for the mixing time because they are, 
in effect, measuring the same thing - the number of iterations that the Markov Chain 
needs to run for before the difference between the current distribution and the sta- 
tionary distribution is small. We will use the integrated autocorrelation time esti- 
mate. 



6.2 Experimental Design 

We used the Markov Chain £§ in two different ways. First, for each of the smaller 
datasets, we ran the chain for 50,000 iterations 15 times. We used this to calculate 
the the autocorrelation values for each edge for each lag between 100 and 15,000 in 
multiples of 100. From this, we calculated the estimated integrated autocorrelation 
time, as well as the iteration time for the autocorrelation of each edge to drop under 
a threshold of 0.001. This is discussed in Section 6.4. 

We also rephcated the experimental design of Raftery and Lewis [52]. Given 
our estimates of the autocorrelation time for each size graph in Section 6.4, we ran 
the chain again for long enough to capture 10,000 samples where each sample had 
X iterations of the chain between them, x was chosen to vary from much smaller 
than the estimated autocorrelation time, to much larger. From these samples, we 
calculated the sample mean for each edge, and compared it with the actual mean 
from the joint degree matrix. We looked at the total variational distance between 
the sample means and actual means and showed that the difference appears to be 
converging to 0. We chose the mean as an evaluation metric because we were able 
to calculate the true means theoretically. We are unaware of another similarly simple 
metric. 

We used the formulas for empirical evaluation of mixing time from page 14 of 
Sokal's survey [56]. In particular, we used the following: 

• The sample mean is ju = ^ ^4=1 ^i- 
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• The sample uimonnalized autocorrelation function is C{t) = ^ ^4=1 i^i ^ M) (-^r+i ~ 

n 

• The natural estimator of isp(f) =C(f)/C(0) 

• The estimator for Timj is f,„( = 5 L"=l(„_i) ■^(Op(0 where A is a 'suitable' 
cutoff function. 

For a sequence of length x, calculating the autocorrelation of gap f requires (x — 
t)^ dot products. Our experiments require that we calculate the autocorrelation for 
each possible edge in a graph for many lags. Thus running the full set of experiments 
requires (9(|ypxlogx) time and is prohibitive when V is large. Note that x must 
necessarily be at least & {E) as well, since the mixing time can not be sub-linear in 
the number of edges. In Section 6.3 we will discuss results on the smaller datasets 
(AdjNoun, Dolphins, Football, Karate, and LesMis) that suggest a more feasible 
method for estimating autocorrelation time for larger graphs. We use this method to 
evaluate the autocorrelation time for the larger graphs as well, and present all of the 
results together. Rather than running the chain for 15,000 steps for the larger graphs, 
we selected more appropriate stopping conditions that were generally 10|£| based 
on the results for smaller graphs. 



Data Sets 

We have used several publicly available datasets. Word Adjacencies [48], Les Mis- 
erables [29], American College Football [20], the Karate Club [63], the Dolphin 
Social Network [36], C. Elegans Neural Network (celegans) [60, 62], Power grid 
(power) [61], Astrophysics collaborations (astro-ph) [44], High-Energy Theory col- 
laborations (hep-th) [45], Coauthorships in network science (netscience) [49], and 
a snapshot of the Intemet from 2006 (as-22july) [50]. In the following \V\ is the 
number of nodes, \E\ is the number of edges and |^| is the number of non-zero 
entries in the joint degree matrix. 



Dataset 


\E\ 




\/\ 


AdjNoun 


425 


112 


159 


as-22july 


48,436 


22,962 


5,496 


astro-ph 


121,251 


16,705 


11,360 


celegans 


2,359 


296 


642 


Dolphins 


159 


62 


61 


Football 


616 


115 


18 


hep-th 


15,751 


8,360 


629 


Karate 


78 


34 


40 


LesMis 


254 


77 


99 


netscience 


2,742 


1,588 


184 


power 


6,594 


4,940 


108 



Table 1 Details about the datasets, | V| is the number of nodes, |£| is the number of edges and | ^ \ 
is the number of unique entries in the 
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6.3 Relationship Between Mean of an Edge and Autocorrelation 

For each of the smaller graphs, AdjNoun, Dolphins, Football, Karate and LesMis, 
we ran the Markov Chain 10 times for 50,000 iterations and collected an indicator 
variable for each potential edge. For each of these edges, and each run, we calculated 
the autocorrelation function for values of t between 100 and 15,000 in multiples of 
100. For each edge, and each run, we looked at the t value where the autocorrelation 
function first dropped below the threshold of 0.001. We then plotted the mean of 
these values against the mean of the edge, i.e. if it connects vertices of degree c/, and 
dj (where di ^ dj) then jig = ^di,dj/didj or ji^ = ^di4il (2) otherwise. The three 
most useful plots are given in Figures 10 and 1 1 as the other graphs did not contain 
a large range of mean values. 




500 'MO 1500 20M 3500 3CO0 35CK) VfX 45CK) 50O 1M0 1500 £000 2500 3000 3500 40CK) 4500 50CK) 



Fig. 10 The time for an edge's estimated autocorrelation function to pass under the threshold of 
0.001 versus \le for that edge for LesMis and AdjNoun from L to R. 

From these results, we identified a potential relationship between [le and the time 
to pass under a threshold. Unfortunately, none of our datasets contained a significant 
number of edges with larger jUg values, i.e. between 0.5 and 1. In order to test this 
hypothesis, we designed a synthetic dataset that contained the many edges with 
values of at ^ for J = 1 , • • • 20. We describe the creation of this dataset in the 
appendix. 

The final dataset we created had 326 edges, 194 vertices and 21 distinct ^ en- 
tries. We ran the Markov Chain 200 times for this synthetic graph. For each run, we 
calculated the threshold value for each edge. Figure 1 1 shows the edges' mean vs 
its mean time for the autocorrelation value to pass under 0.001. We see that there is 
a roughly symmetric curve that obtains its maximum at /ig = 0.5. 

This result suggests a way to estimate the autocorrelation time for larger graphs 
without repeating the entire experiment for every edge that could possibly appear. 
One can calculate jUg for each edge from the JDM and sample edges with jtig around 
0.5. We use this method for selecting our subset of edges to analyze. In particular, 
we sampled about 300 edges from each of the larger graphs. For all of these except 
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Fig. 11 The time for an edge's estimated autocorrelation function to pass under the threshold of 
0.001 versus for that edge for Karate and the synthetic dataset. The synthetic dataset has a larger 
range of /i^ values than the real datasets and a significant number of edges for each value. 



for power, the values were between 0.4 and 0.6. For power, the maximum /i^, 
value is about 0.15, so we selected edges with the largest fi values. 



6.4 Autocorrelation Values 

For each dataset and each run we calculated the unnormalized autocorrelation val- 
ues. For the smaller graphs, this entailed setting f to every value between 100 and 
15,000 in multiples of 100. We randomly selected 1 run for each dataset and graphed 
the autocorrelation values for each of the edges. We present the data for the Karate 
and Dolphins datasets in Figures 12 and 13. For the larger graphs, we changed the 
starting and ending points, based on the graph size. For example, for Netscience was 
analyzed from 2,000 to 15,000 in multiples of 100, while as-22july was analyzed 
from 1,000 to 500,000 in multiples of 1,000. 



-0.0S - 
-0.04— 



Fig. 12 The exponential drop-off for Karate Fig. 13 The exponential drop-off for Dol- 

appears to end after 400 iterations. phins appears to end after 600 iterations. 



20 



Isabelle Stanton and Ali Pinar 



All of the graphs exhibit the same behavior. We see an exponential drop off ini- 
tially, and then the autocorrelation values oscillate around 0. This behavior is due to 
the htnited number of samples, and a bias due to using the sample mean for each 
edge. If we ignore the noisy tail, then we estimate that the autocorrelation 'dies off' 
at the point where the mean absolute value of the autocorrelation approximately 
converges, then we can locate the 'elbow' in the graphs. This estimate for all graphs 
is given in Table 3 at the end of this Section. 



6.5 Estimated Integrated Autocorrelation Time 

For each dataset and run, we calculated the estimated integrated autocorrelation 
time. For the datasets with fewer than 1,000 edges, we calculated the autocorrelation 
in lags of 100 from 100 to 15,000 for each dataset. For the larger ones, we used inter- 
vals that depended on the total size of the graph. We estimate p{t) as the size of the 
intervals times the sum of the values. The cut-off function we used for the smaller 
graphs was A(f) = lifO</< 15,000 and otherwise. This value was calculated 
for each edge. In Table 2 we present the mean, maximum and minimum estimated 
integrated autocorrelation time for each dataset over the runs of the Markov Chain 
using three different methods. For each of the edges, we first calculated the mean, 
median and max estimated integrated autocorrelation value over the various runs. 
Then, for each of these three values for each edge, we calculated the max, mean and 
min over all edges. For each of the graphs, the data series representing the median 
and max have each had their x- values perturbed slightly for clarity. 



Dataset 


\E\ 


mean 


max 


min 


median 


max 


min 


maximum 


max 


min 


Karate 


78 


288.92 


444.1 


221.13 


288.31 


443 


217.63 


382.59 


608.06 


268.95 


Dolphins 


159 


383.21 


553.84 


256.13 


377.4 


550.99 


211.44 


528.86 


1134.1 


397.35 


LesMis 


254 


559.77 


931.35 


129.45 


542.43 


895.57 


57.492 


894.08 


2598.6 


342.76 


AdjNoun 


425 


688.71 


1154.9 


156.49 


659.06 


1160.3 


66.851 


1186.1 


4083.6 


350.97 


Football 


616 


962.42 


2016.9 


404.77 


925.97 


1646.9 


349.12 


1546.4 


7514.3 


967 


celegans 


2359 


3340.2 


4851.4 


2458.8 


3235.7 


4861.4 


2323.6 


4844.6 


7836.9 


3065.5 


netscience 


2742 


1791.4 


3147.2 


1087.7 


1658.3 


3033.2 


937.8382 


3401 


7404 


1894.7 


power 


6594 


6624.5 


17933 


2166.9 


4768.8 


16901 


250.6012 


20599 


54814 


7074.7 


hep-th 


15751 


26552 


36816 


14976 


25608 


37004 


14130 


46309 


64936 


25753 


as-22july 


48436 


89637 


139280 60627 


87190 


152490 


58493 


121930 


256520 


76214 


astro-ph 


121251 


121860 298970 37706 


119900 321730 


46830 


152930 


408000 


84498 



Table 2 A summary of all the estimated integrated autocorrelation times. Mean refers to taking the 
mean autocorrelation time for each edge, and then the mean, min and max of these values over all 
measured edges. Similarly, median is the median value for each edge, while max is the maximum 
for each edge. 



These values are graphed on a log-log scale plot. Further, we also present a graph 
showing the ratio of these values to the number of edges. The ratio plot. Figure 15, 
suggests that the autocorrelation time may be a Unear function of the number of 



Constructing and Sampling Graphs with a Prescribed Joint Degree Distribution 



21 



edges in the graph, however the estimates are noisy due to the hmited number of 
runs. 

Log-Log Plot of Estimated Integrated Autocorrelation Time 




3 3.5 4 

Log(Number of Edges) 



5.5 



Fig. 14 The max, median and min values over the edges for the est. int. autocorrelation times in 
a log-log plot. L to R in order of size: Karate, Dolphins, LesMis, AdjNoun, Football, celegans, 
netscience, power, hep-th, as-22july and astro-ph 



All three metrics give roughly the same picture. We note that there is much higher 
variance in estimated autocorrelation time for the larger graphs. If we consider the 
evidence of the log-log plot and the ratio plot, we suspect that the autocorrelation 
time of this Markov Chain is linear in the number of edges. 



6.6 The Sample Mean Approaches the Real Mean for Each Edge 

Given the results of the previous experiment estimating the integrated autocorrela- 
tion time, we next executed an experiment suggested by Raftery and Lewis [52]. 
First we note that for each edge e, we know the true value of P{e e G\G has ^) 

is exactly or if e is an edge between degrees k and I. This is because 

there are S'k^i potential {k,l) edges that show up in any graph with a fixed ^ , and 
each graph has i of them. If we consider the graphs as being labeled, then we 
can see that each edge has an equal probability of showing up when we consider 
permutations of the orderings. 
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Ratio of Estimated Integrated Autocorreiation Time to Number of Edges 




6 8 10 

Number Identifying Dataset 

Fig. 15 The ratio of the max, median and min values over the edges to the number of edges for 
the estimated integrated autocorrelation times. L to R in order of size: Karate, Dolphins, LesMis, 
AdjNoun, Football, celegans, netscience, power, hep-th, as-22july and astro-ph 



Thus, our experiment was to take samples at varying intervals, and consider how 
the sample mean of each edge compared with our known theoretical mean. For the 
smaller graphs, we took 10,000 samples at varying gaps depending on our esti- 
mated integrated autocorrelation time and repeated this 10 times. Additionally, we 
saw that the total variational distance quickly converged to a small, but non-zero 
value. We repeated this experiment with 20,000 samples and, for the two smallest 
graphs. Karate and Dolphins, we repeated the experiment with 5,000 and 40,000 
samples. These results show that this error is due to the number of samples and not 
the sampler. For the graphs with more than 1,000 edges, each run resulted in 20,000 
samples at varying gaps, and this was repeated 5 times. We present these results in 
Figures 18 through 28. If Se.g is the sample mean for edge e and gap g, and jj.^ is the 
true mean, then the graphed value is \Se,g — Mi? I / M<?- 

In all of the figures, the line runs through the median error for the runs and the 
error bars are the maximum and minimum values. We note that the maximum and 
minimum are very close to the median as they are within 0.05% for most intervals. 
These graphs imply that we are sampling uniformly after a gap of 175 for the Karate 
graph. For the dolphin graph, we see very similar results, and note that the error 
becomes constant after a sampling gap of 400 iterations. 

For the larger graphs, we varied the gaps based on the graph size, and then fo- 
cused on the area where the error appeared to be decreasing. Again, we see con- 
sistent results, although the residual error is higher. This is to be expected because 
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Fig. 16 The Dolphin Dataset with 5,000 to 
40,000 samples 




Fig. 18 The AdjNoun Dataset with 10,000 and 
20,000 samples 





Fig. 17 The Karate Dataset with 5,000 to 
40,000 samples 




Fig. 19 The AS-22July06 Dataset with 20,000 
samples 




Fig. 20 The Astro-PH Dataset with 20,000 Fig. 21 The Celegans Dataset with 20,000 
samples samples 
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Fig. 22 The Football Dataset with 10,000 and 
20,000 samples 



Peicem Enor or Total Variational Distance, Leamis 




Number of Iterations Between Samples 



Fig. 24 The LesMis Dataset with 10,000 and 
20,000 samples 



Percent Enw of Total Variational Distance, Power 
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Fig. 23 The Hep-TH Dataset with 20,000 sam- 
ples 



Percent Enn of Total Variational Distance, Netsclence 




Fig. 25 The Netscience Dataset with 20,000 
samples 



Fig. 26 The Power Dataset with 20,000 sam- 
ples 
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there are more potential edges in these graphs, so we took relatively fewer samples 
per edge. A summary of the results can be foimd in Table 3. 



6.7 Summary of Experiments 





1^1 


Max EI 


Mean Conv. 


Thresh. 


AdjNoun 


425 


1186 


900 


700 


AS-22July 


48,436 


256,520 


95,000 


156,744 


Astro-PH 


121,251 


408,000 


120,000 


343,154 


Celegans 


2,359 


7836.9 


3,750 


7,691 


Dolphins 


159 


528 


400 


600 


Football 


616 


1546 


1000 


900 


Hep-TH 


15,751 


64,936 


28,000 


22,397 


Karate 


78 


382 


175 


400 


LesMis 


254 


894 


800 


1000 


Netscience 


2,742 


7,404 


2,000 


7,017 


Power 


6,594 


54,814 


8,000 


7,270 



Table 3 A summary of estimates on convergence from the three experiments. The values are 
the Maximum Estimated Integrated Autocorrelation time (Max EI, the third column of Table 2), 
the Sample Mean Convergence iteration number, and the time to drop under the Autocorrelation 
Threshold. The Autocorrelation threshold was calculated as when the average absolute value of the 
autocorrelation was less than 0.0001 



Based on the results in this table, our recommendation would be that running 
the Markov Chain for 5m steps would satisfy all running time estimates except for 
Power's results for the Maximum Estimated Integrated Autocorrelation time. This 
estimate is significantly lower than the result for Chain that was obtained using 
the standard theoretical technique of canonical paths. 



7 Conclusions and Future Work 

This paper makes two primary contributions. The first is the investigation of Markov 
Chain methods for uniformly sampling graphs with a fixed joint degree distribution. 
Previous work shows that the mixing time of is polynomial, while our experi- 
ments suggest that the mixing time of ^ is also polynomial. The relationship be- 
tween the mean of an edge and the autocorrelation values can be used to efficiently 
experiment with larger graphs by sampling edges with mean between 0.4 and 0.6 
and repeating the analysis for just those edges. This was used to repeat the exper- 
iments for larger graphs and to provide further convincing evidence of polynomial 
mixing time. 
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Our second contribution is in the design of the experiments to evaluate the mixing 
time of the Markov Chain. In practice, it seems the stopping time for sampling is 
often chosen without justification. Autocorrelation is a simple metric to use, and 
can be strong evidence that a chain is close to the stationary distribution when used 
correctly. 
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8 Appendix 



Designing Synthetic Data 

Our goal was to represent all of the potential means for ^ for < j < 20. We note 
that 20 factors into 4 and 5, so we want to first fix some degrees such that f^^. = 4 
and ^1 = 5. For convenience, because the maximum number of edges we will be 
assigning is 20, we will pick these degrees to be A" = {20,21,22,23,24} for ^k = ^ 
and L = {25 , 26 , 27 , 28 } for 5?/ = 5 . The number of each we picked was to guarantee 
that there were at least 20 combinations of edge types. We can now assign the values 
1 — 20 arbitrarily to ^kxl- This assignment clearly satisfies that ^^^i < ^k^i so 
far. 
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Now, we must fill in the rest of ^ so that Q is integer valued for degrees. One 
way is to note that we should have 4 x 20 degree 20 edges. We can sum the num- 
ber of currently allocated edges with one endpoint of degree 20, call this x and set 
,^1,20 = 80 — X. There are many other ways of consistently completing ^ , such as 
assigning as many edges as possible to the KxK and LxL entries, like ^20.21 ■ This 
results in a denser graph. For the synthetic graph used in this paper, we completed 
^ by adding aU edges as (1,20), (1,21) etc edges. We chose this because it was 
simple to verify and it also made it easy to ignore the edges that were not of interest. 



